Open In Colab

ECE M214A Project: Classification of Speaker Regions¶

In this project, we'll train a machine learning algorithm to classify speakers by regional dialect. We will use speech samples from the Corpus of Regional African American Language (CORAAL - https://oraal.uoregon.edu/coraal) with speakers each belonging to one of five different US cities: 1) Rochester, NY (ROC), 2) Lower East Side, Manhattan, NY (LES), 3) Washington DC (DCB), 4) Princeville, NC (PRV), or 5) Valdosta, GA (VLD).

The project files can be downloaded from this link (UCLA Account Required)

To do this, we will first extract features from the audio files and then train a classifier to predict the city of origin of the utterance's speaker. The goal is to extract a feature that contains useful information about regional dialect characteristics.

Authors: Keith Chen, Timothy Do, James Shiffer, Thomas Sison

1. Setup¶

In [184]:
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False
print(f"In CoLab: {IN_COLAB}")

# Libraries
from IPython.display import Audio
import xgboost
import numpy as np
import shap
import pandas as pd
import os
import librosa
import torchaudio
import numpy as np
from glob import glob
from tqdm import tqdm
import contextlib
import joblib
import math
import time
from sklearn import metrics
import matplotlib.pyplot as plt

# Clone the Repo if in CoLab & Install Dependencies
if(IN_COLAB):
  os.system('git clone https://dotimothy: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@github.com/dotimothy/M214A-Project.git')
  os.system('mv -v ./M214A-Project/* .')
  os.system('rm -rf M214A-Project')
  os.system('pip install -r requirements.txt')
if(not(os.path.exists('./W24 ECE M214A Project.zip'))):
  if(IN_COLAB):
    !gdown 1abjWNLbLzwJJ2qv_V0CfpfeMzs56sSut
  else:
    print('Please Download The Project Data (W24 ECE M214A Project.zip) and Try Again')
  if(not(os.path.exists('./W24 ECE M214A Project/'))):
    if(IN_COLAB):
      !unzip "W24 ECE M214A Project.zip"
    else:
      print('Please Unzip the Project Data (W24 ECE M214A Project.zip) to ((W24 ECE M214A Project/) and Try Again')
project_dir =  './W24 ECE M214A Project'
In CoLab: False

2. Getting familiar with the data¶

Let's take a moment to understand the data. The original CORAAL dataset consists of ~85 different speakers, each from one of five cities. The audio files are names with the convention: DCB_se1_ag1_f_03. Here, DCB is the city code, se1 denotes the socioeconomic group of the speaker, ag1 denotes the age group of the speaker, f denotes female, and 03 denotes the participant number. These unique combinations of identifiers mark the speaker.

The dataset has been preprocessed to only include audio segments greater than 10 seconds in length. there are a number of audio snippets of at least 10sec in length. Those segments are numbered with the appending tag _seg_number for each segment.

You can also try listening to any segment like this:

In [2]:
sr = 44100

# Audio(filename= "drive/MyDrive/project_data/train_clean/DCB_se1_ag1_f_03_1_seg_3.wav", rate=sr)
Audio(filename= f"{project_dir}/project_data/train_clean/DCB_se1_ag1_f_03_1_seg_3.wav", rate=sr)
Out[2]:
Your browser does not support the audio element.

The original dataset has also been split into a train and test set. The test set has been further split, with a portion corrupted with the addition of noise:

In [3]:
sr = 44100

# Audio(filename= "drive/MyDrive/project_data/test_noisy/LES_se0_ag3_f_01_1_seg_57.wav", rate=sr)
Audio(filename= f"{project_dir}/project_data/test_noisy/LES_se0_ag3_f_01_1_seg_57.wav", rate=sr)
Out[3]:
Your browser does not support the audio element.

3. Feature Extraction¶

As a baseline, we will be using the average mfcc value over time from the Librosa Python library. Your job will be to choose better features to improve performance on both the clean and noisy data. M214A Group 5 also introduced some multiprocessing (joblib) features to accelerate feature extraction in addition to adding options for utilizing MFCC feature's first and second derivative.

We first define a pair of functions to create features and labels for our classification model:

In [333]:
def extract_feature_mfcc(audio_file, n_mfcc=13,mfcc_1st=True,mfcc_2nd=False):

  '''
  Extract MFCC Coeffcients
  '''

  audio,fs = torchaudio.load(audio_file)
  audio = audio.numpy().reshape(-1)

  # mfcc
  mfccs = librosa.feature.mfcc(y=audio, sr=fs, n_mfcc=n_mfcc)
  feat_out = np.mean(mfccs,axis=1)
  if(mfcc_1st):
    mfccs_delta1 = librosa.feature.delta(mfccs,order=1)
    delta1_out = np.mean(mfccs_delta1,axis=1)
    feat_out = np.concatenate((feat_out,delta1_out),axis=None)
  if(mfcc_2nd):
    mfccs_delta2 = librosa.feature.delta(mfccs,order=2)
    delta2_out = np.mean(mfccs_delta2,axis=1)
    feat_out = np.concatenate((feat_out,delta2_out),axis=None)
  return feat_out


def get_label(file_name):
  '''
  Function to retrieve output labels from filenames
  '''
  if 'ROC' in file_name:
    label=0
  elif 'LES' in file_name:
    label=1
  elif 'DC' in file_name:
    label=2
  elif 'PRV' in file_name:
    label=3
  elif 'VLD' in file_name:
    label=4
  else:
    raise ValueError('invalid file name')
  return label
In [5]:
test_wav = f'{project_dir}/project_data/train_clean/DCB_se1_ag1_f_01_1_seg_0.wav'
print(extract_feature_mfcc(test_wav,13,False,False).shape)
(13,)
In [183]:
# Allows for Parallel Progress Bar with Joblib
# Source: https://stackoverflow.com/questions/24983493/tracking-progress-of-joblib-parallel-execution/58936697#58936697
@contextlib.contextmanager
def tqdm_joblib(tqdm_object):
    class TqdmBatchCompletionCallback(joblib.parallel.BatchCompletionCallBack):
        def __call__(self, *args, **kwargs):
            tqdm_object.update(n=self.batch_size)
            return super().__call__(*args, **kwargs)

    old_batch_callback = joblib.parallel.BatchCompletionCallBack
    joblib.parallel.BatchCompletionCallBack = TqdmBatchCompletionCallback
    try:
        yield tqdm_object
    finally:
        joblib.parallel.BatchCompletionCallBack = old_batch_callback
        tqdm_object.close()

# Individual Process Function
def extract_feature_list(wav_files,extract_feature,args):
    wav_files.sort()
    wav_feat = []
    for wav in tqdm(wav_files):        
        wav_feat.append(extract_feature(wav,*args))
    return wav_feat

# Function to Divide a List
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

# Overall Feature Extraction 
def extract_feature_dir(wav_files,extract_feature,args,accel=False):
    wav_files.sort()
    wav_feat = []
    if(accel):
        num_processes = round(joblib.cpu_count())
        chunk_size  = max(math.ceil(len(wav_files)/num_processes),num_processes)
        wav_chunks = list(chunks(wav_files,chunk_size))
        # To Implement Multi-Processing
        with tqdm_joblib(tqdm(total=len(wav_chunks))) as pb:
            results = joblib.Parallel(n_jobs=num_processes)(joblib.delayed(extract_feature_list)(wav_chunk,extract_feature,args) for wav_chunk in wav_chunks)
            wav_feat = [x for xs in results for x in xs]
    else:
        wav_feat = extract_feature_list(wav_files,extract_feature,args)
    return wav_feat

def get_label_dir(wav_files):
    return [get_label(wav) for wav in tqdm(wav_files)]
In [90]:
# Test Parallelism
test_clean_files = glob(f'{project_dir}/project_data/test_clean/*.wav')
test_clean_files.sort()
args = (13,True,False)
start = time.time()
test_clean_feat = extract_feature_dir(test_clean_files,extract_feature_mfcc,args,False)
singleTime = time.time()-start
start = time.time()
test_clean_feat = extract_feature_dir(test_clean_files,extract_feature_mfcc,args,True)
multiTime = time.time() - start
print(f'Single-Core Extraction: {singleTime} s')
print(f'Multi-Core Extraction: {multiTime} s')
print(f'Speedup: {(singleTime/multiTime):.2f}x')
100%|████████████████████████████████████████████████████████████████████████████████| 447/447 [00:25<00:00, 17.74it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 19/19 [00:09<00:00,  1.96it/s]
Single-Core Extraction: 25.20493197441101 s
Multi-Core Extraction: 9.680119037628174 s
Speedup: 2.60x

Let us now call these functions to extract the features and labels from the train_clean directory

In [345]:
# Directories
feature_dir = f'./features'
if(not(os.path.exists(feature_dir))):
    os.mkdir(feature_dir)
train_files = glob(f'{project_dir}/project_data/train_clean/*.wav')
train_files.sort()
test_clean_files = glob(f'{project_dir}/project_data/test_clean/*.wav')
test_clean_files.sort()
test_noisy_files = glob(f'{project_dir}/project_data/test_noisy/*.wav')
test_noisy_files.sort()

# Feature Extraction
accel = True
n_mfcc = 25
mfcc_1st = True
mfcc_2nd = True
args = (n_mfcc,mfcc_1st,mfcc_2nd)

trainMFCC = f'{feature_dir}/mfcc{n_mfcc}{"_delta1" if mfcc_1st else ""}{"_delta2" if mfcc_2nd else ""}_train.csv'
testCleanMFCC = f'{feature_dir}/mfcc{n_mfcc}{"_delta1" if mfcc_1st else ""}{"_delta2" if mfcc_2nd else ""}_test_clean.csv'
testNoisyMFCC = f'{feature_dir}/mfcc{n_mfcc}{"_delta1" if mfcc_1st else ""}{"_delta2" if mfcc_2nd else ""}_test_noisy.csv'
if(not(os.path.exists(trainMFCC))):
    train_feat = extract_feature_dir(train_files,extract_feature_mfcc,args,accel)
train_label = get_label_dir(train_files)
if(not(os.path.exists(testCleanMFCC))):
    test_clean_feat = extract_feature_dir(test_clean_files,extract_feature_mfcc,args,accel)
test_clean_label = get_label_dir(test_clean_files)
if(not(os.path.exists(testNoisyMFCC))):
    test_noisy_feat = extract_feature_dir(test_noisy_files,extract_feature_mfcc,args,accel)
test_noisy_label = get_label_dir(test_noisy_files)
100%|█████████████████████████████████████████████████████████████████████████| 4372/4372 [00:00<00:00, 2189290.48it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 447/447 [00:00<?, ?it/s]
100%|████████████████████████████████████████████████████████████████████████████| 347/347 [00:00<00:00, 346859.74it/s]

4. Model Training and Predictions¶

Now we'll train the backend system to predict the regions from the input features. We'll use an xgboosted decision tree for this. An advantage of this model is that we can also parse the decision tree and measure the impact of different features in the end result for explainability

In [346]:
# MFCC Features
feat_names = ['mfcc_' + str(n) for n in range(n_mfcc)]
if(mfcc_1st):
    feat_names += ['mfcc_delta1_' + str(n) for n in range(n_mfcc)]
if(mfcc_2nd):
    feat_names += ['mfcc_delta2_' + str(n) for n in range(n_mfcc)]

# Prepare Training Set
if(not(os.path.exists(trainMFCC))):
    train_feat_df = pd.DataFrame(data=np.stack(train_feat), columns=feat_names)
    train_feat_df.to_csv(trainMFCC)
else: 
    train_feat_df = pd.read_csv(trainMFCC,index_col=0)
y_train = np.stack(train_label)

# Prepare Test Clean Set
if(not(os.path.exists(testCleanMFCC))):
    test_clean_feat_df = pd.DataFrame(data=np.stack(test_clean_feat), columns=feat_names)
    test_clean_feat_df.to_csv(testCleanMFCC,)
else:
    test_clean_feat_df = pd.read_csv(testCleanMFCC,index_col=0)
y_test_clean = np.stack(test_clean_label)

# Prepare Test Noisy Set
if(not(os.path.exists(testNoisyMFCC))):
    test_noisy_feat_df = pd.DataFrame(data=np.stack(test_noisy_feat), columns=feat_names)
    test_noisy_feat_df.to_csv(testNoisyMFCC)
else:
    test_noisy_feat_df = pd.read_csv(testNoisyMFCC,index_col=0)
y_test_noisy = np.stack(test_noisy_label)


#you could just pass in the matrix of features to xgboost
#but it looks prettier in the shap explainer if you format it
#as a dataframe.


model = xgboost.XGBClassifier()
model.fit(train_feat_df,y_train)

print("Train MFCC Clean Acc =", np.sum(y_train==model.predict(train_feat_df))/len(y_train))

print("Test MFCC Clean Acc =", np.sum(y_test_clean==model.predict(test_clean_feat_df))/len(y_test_clean))

print("Test MFCC Noisy Acc =", np.sum(y_test_noisy==model.predict(test_noisy_feat_df))/len(y_test_noisy))
Train MFCC Clean Acc = 1.0
Test MFCC Clean Acc = 0.8881431767337807
Test MFCC Noisy Acc = 0.6282420749279539

5. Interpreting Results and Explainability¶

To see the impact different features have on the model, we create a plot of the feature importances. The features are listed top to bottom in order of how important they were to the decision.

In [298]:
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(train_feat_df)
shap.summary_plot(shap_values, train_feat_df)
No description has been provided for this image

And we can see a confusion matrix of the mispredictions

In [299]:
confusion_matrix_clean = metrics.confusion_matrix(y_test_clean, model.predict(test_clean_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_clean, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
No description has been provided for this image
In [300]:
confusion_matrix_noisy = metrics.confusion_matrix(y_test_noisy, model.predict(test_noisy_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_noisy, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
No description has been provided for this image

6. Opensmile¶

Feature Set GeMAPSv01b¶

In [13]:
import opensmile
smile = opensmile.Smile(
      feature_set=opensmile.FeatureSet.GeMAPSv01b,
      feature_level=opensmile.FeatureLevel.Functionals,
  )
In [14]:
print(f'Opensmile Features: {smile.feature_names}')
print(f'\nTotal Number of Features: {len(smile.feature_names)}')
Opensmile Features: ['F0semitoneFrom27.5Hz_sma3nz_amean', 'F0semitoneFrom27.5Hz_sma3nz_stddevNorm', 'F0semitoneFrom27.5Hz_sma3nz_percentile20.0', 'F0semitoneFrom27.5Hz_sma3nz_percentile50.0', 'F0semitoneFrom27.5Hz_sma3nz_percentile80.0', 'F0semitoneFrom27.5Hz_sma3nz_pctlrange0-2', 'F0semitoneFrom27.5Hz_sma3nz_meanRisingSlope', 'F0semitoneFrom27.5Hz_sma3nz_stddevRisingSlope', 'F0semitoneFrom27.5Hz_sma3nz_meanFallingSlope', 'F0semitoneFrom27.5Hz_sma3nz_stddevFallingSlope', 'loudness_sma3_amean', 'loudness_sma3_stddevNorm', 'loudness_sma3_percentile20.0', 'loudness_sma3_percentile50.0', 'loudness_sma3_percentile80.0', 'loudness_sma3_pctlrange0-2', 'loudness_sma3_meanRisingSlope', 'loudness_sma3_stddevRisingSlope', 'loudness_sma3_meanFallingSlope', 'loudness_sma3_stddevFallingSlope', 'jitterLocal_sma3nz_amean', 'jitterLocal_sma3nz_stddevNorm', 'shimmerLocaldB_sma3nz_amean', 'shimmerLocaldB_sma3nz_stddevNorm', 'HNRdBACF_sma3nz_amean', 'HNRdBACF_sma3nz_stddevNorm', 'logRelF0-H1-H2_sma3nz_amean', 'logRelF0-H1-H2_sma3nz_stddevNorm', 'logRelF0-H1-A3_sma3nz_amean', 'logRelF0-H1-A3_sma3nz_stddevNorm', 'F1frequency_sma3nz_amean', 'F1frequency_sma3nz_stddevNorm', 'F1bandwidth_sma3nz_amean', 'F1bandwidth_sma3nz_stddevNorm', 'F1amplitudeLogRelF0_sma3nz_amean', 'F1amplitudeLogRelF0_sma3nz_stddevNorm', 'F2frequency_sma3nz_amean', 'F2frequency_sma3nz_stddevNorm', 'F2amplitudeLogRelF0_sma3nz_amean', 'F2amplitudeLogRelF0_sma3nz_stddevNorm', 'F3frequency_sma3nz_amean', 'F3frequency_sma3nz_stddevNorm', 'F3amplitudeLogRelF0_sma3nz_amean', 'F3amplitudeLogRelF0_sma3nz_stddevNorm', 'alphaRatioV_sma3nz_amean', 'alphaRatioV_sma3nz_stddevNorm', 'hammarbergIndexV_sma3nz_amean', 'hammarbergIndexV_sma3nz_stddevNorm', 'slopeV0-500_sma3nz_amean', 'slopeV0-500_sma3nz_stddevNorm', 'slopeV500-1500_sma3nz_amean', 'slopeV500-1500_sma3nz_stddevNorm', 'alphaRatioUV_sma3nz_amean', 'hammarbergIndexUV_sma3nz_amean', 'slopeUV0-500_sma3nz_amean', 'slopeUV500-1500_sma3nz_amean', 'loudnessPeaksPerSec', 'VoicedSegmentsPerSec', 'MeanVoicedSegmentLengthSec', 'StddevVoicedSegmentLengthSec', 'MeanUnvoicedSegmentLength', 'StddevUnvoicedSegmentLength']

Total Number of Features: 62

Feature Set eGeMAPSv02¶

In [15]:
smile2 = opensmile.Smile(
    feature_set=opensmile.FeatureSet.eGeMAPSv02,
    feature_level=opensmile.FeatureLevel.LowLevelDescriptors,
)
In [16]:
print(f'OpenSmile2 Features: {smile2.feature_names}')
print(f'\nTotal Number of Features: {len(smile2.feature_names)}')
OpenSmile2 Features: ['Loudness_sma3', 'alphaRatio_sma3', 'hammarbergIndex_sma3', 'slope0-500_sma3', 'slope500-1500_sma3', 'spectralFlux_sma3', 'mfcc1_sma3', 'mfcc2_sma3', 'mfcc3_sma3', 'mfcc4_sma3', 'F0semitoneFrom27.5Hz_sma3nz', 'jitterLocal_sma3nz', 'shimmerLocaldB_sma3nz', 'HNRdBACF_sma3nz', 'logRelF0-H1-H2_sma3nz', 'logRelF0-H1-A3_sma3nz', 'F1frequency_sma3nz', 'F1bandwidth_sma3nz', 'F1amplitudeLogRelF0_sma3nz', 'F2frequency_sma3nz', 'F2bandwidth_sma3nz', 'F2amplitudeLogRelF0_sma3nz', 'F3frequency_sma3nz', 'F3bandwidth_sma3nz', 'F3amplitudeLogRelF0_sma3nz']

Total Number of Features: 25
In [17]:
# smileType: 0 (smile), 1 (smile2)
def extract_smile(wav,smileType):

  audio,sample_fs = torchaudio.load(wav)

  sample_audio = audio.numpy().reshape(-1)

  feature_sets = [opensmile.FeatureSet.GeMAPSv01b,opensmile.FeatureSet.eGeMAPSv02]
  feature_levels = [opensmile.FeatureLevel.Functionals,opensmile.FeatureLevel.LowLevelDescriptors]
  smile = opensmile.Smile(
        feature_set=feature_sets[smileType],
        feature_level=feature_levels[smileType]
    )

  #opensmile.FeatureSet.ComParE_2016,
  #opensmile.FeatureSet.eGeMAPSv02
  y = smile.process_signal(
      sample_audio,
      sample_fs
  )
  return np.array(y.iloc[0])
In [18]:
test_wav = f'{project_dir}/project_data/train_clean/DCB_se1_ag1_f_01_1_seg_0.wav'
print(extract_smile(test_wav,0).shape)
print(extract_smile(test_wav,1).shape)
(62,)
(25,)
In [19]:
# Extracting Smile Features
trainOS = f'{feature_dir}/os_train.csv'
testCleanOS = f'{feature_dir}/os_test_clean.csv'
testNoisyOS = f'{feature_dir}/os_test_noisy.csv'

args = (0)
accel = True

if(not(os.path.exists(trainOS))):
    train_feat_os = extract_feature_dir(train_files,extract_smile,args,accel)
if(not(os.path.exists(testCleanOS))):
    test_clean_os = extract_feature_dir(test_clean_files,extract_smile,args,accel)
if(not(os.path.exists(testNoisyOS))):
    test_noisy_os = extract_feature_dir(test_noisy_files,extract_smile,args,accel)
In [20]:
# Evaluating OS Feature
feat_names_os  = list(smile.feature_names)
if(not(os.path.exists(trainOS))):
    train_feat_os_df = pd.DataFrame(data=np.stack(train_feat_os), columns=feat_names_os)
    train_feat_os_df.to_csv(trainOS)
else:
    train_feat_os_df = pd.read_csv(trainOS,index_col=0)
if(not(os.path.exists(testCleanOS))):
    test_clean_feat_os_df = pd.DataFrame(data=np.stack(test_clean_feat_os), columns=feat_names_os)
    test_clean_feat_os_df.to_csv(testCleanOS)
else: 
    test_clean_feat_os_df = pd.read_csv(testCleanOS,index_col=0)
if(not(os.path.exists(testNoisyOS))):
    test_noisy_feat_os_df = pd.DataFrame(data=np.stack(test_noisy_feat_os), columns=feat_names_os)
    test_noisy_feat_os_df.to_csv(testNoisyOS)
else:
    test_noisy_feat_os_df = pd.read_csv(testNoisyOS,index_col=0)

model_os = xgboost.XGBClassifier()
model_os.fit(train_feat_os_df,y_train)

print("Train Clean Opensmile Acc =", np.sum(y_train==model_os.predict(train_feat_os_df))/len(y_train))

print("Test Clean Opensmile Acc =", np.sum(y_test_clean==model_os.predict(test_clean_feat_os_df))/len(y_test_clean))

print("Test Noisy Opensmile Acc =", np.sum(y_test_noisy==model_os.predict(test_noisy_feat_os_df))/len(y_test_noisy))

print()
explainer = shap.TreeExplainer(model_os)
shap_values_os = explainer.shap_values(train_feat_os_df)
shap.summary_plot(shap_values_os, train_feat_os_df)

print()
confusion_matrix_clean_os = metrics.confusion_matrix(y_test_clean, model_os.predict(test_clean_feat_os_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_clean_os, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()

print()
confusion_matrix_noisy_os = metrics.confusion_matrix(y_test_noisy, model_os.predict(test_noisy_feat_os_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_noisy_os, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
Train Clean Opensmile Acc = 1.0
Test Clean Opensmile Acc = 0.9261744966442953
Test Noisy Opensmile Acc = 0.49279538904899134

No description has been provided for this image

No description has been provided for this image

No description has been provided for this image
In [21]:
# Extracting Smile2 Features
trainOS2 = f'{feature_dir}/os2_train.csv'
testCleanOS2 = f'{feature_dir}/os2_test_clean.csv'
testNoisyOS2 = f'{feature_dir}/os2_test_noisy.csv'

args = (1)
accel = True

if(not(os.path.exists(trainOS2))):
    train_feat_os2 = extract_feature_dir(train_files,extract_smile,args,accel)
if(not(os.path.exists(testCleanOS2))):
    test_clean_feat_os2 = extract_feature_dir(test_clean_files,extract_smile,args,accel)
if(not(os.path.exists(testNoisyOS2))):
    test_noisy_feat_os2 = extract_feature_dir(test_noisy_files,extract_smile,args,accel)
In [22]:
# Evaluating OS Feature
feat_names_os2  = list(smile2.feature_names)
if(not(os.path.exists(trainOS2))):
    train_feat_os2_df = pd.DataFrame(data=np.stack(train_feat_os2), columns=feat_names_os2)
    train_feat_os2_df.to_csv(trainOS2)
else:
    train_feat_os2_df = pd.read_csv(trainOS2,index_col=0)
if(not(os.path.exists(testCleanOS2))):
    test_clean_feat_os2_df = pd.DataFrame(data=np.stack(test_clean_feat_os2), columns=feat_names_os2)
    test_clean_feat_os2_df.to_csv(testCleanOS2)
else: 
    test_clean_feat_os2_df = pd.read_csv(testCleanOS2,index_col=0)
if(not(os.path.exists(testNoisyOS2))):
    test_noisy_feat_os2_df = pd.DataFrame(data=np.stack(test_noisy_feat_os2), columns=feat_names_os2)
    test_noisy_feat_os2_df.to_csv(testNoisyOS2)
else:
    test_noisy_feat_os2_df = pd.read_csv(testNoisyOS2,index_col=0)

model_os2 = xgboost.XGBClassifier()
model_os2.fit(train_feat_os2_df,y_train)

print("Train Clean Opensmile 2 Acc =", np.sum(y_train==model_os2.predict(train_feat_os2_df))/len(y_train))

print("Test Clean Opensmile 2 Acc =", np.sum(y_test_clean==model_os2.predict(test_clean_feat_os2_df))/len(y_test_clean))

print("Test Noisy Opensmile 2 Acc =", np.sum(y_test_noisy==model_os2.predict(test_noisy_feat_os2_df))/len(y_test_noisy))

print()
explainer = shap.TreeExplainer(model_os2)
shap_values_os2 = explainer.shap_values(train_feat_os2_df)
shap.summary_plot(shap_values_os2, train_feat_os2_df)

print()
confusion_matrix_clean_os2 = metrics.confusion_matrix(y_test_clean, model_os2.predict(test_clean_feat_os2_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_clean_os2, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()

print()
confusion_matrix_noisy_os2 = metrics.confusion_matrix(y_test_noisy, model_os2.predict(test_noisy_feat_os2_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_noisy_os, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
Train Clean Opensmile 2 Acc = 1.0
Test Clean Opensmile 2 Acc = 0.7046979865771812
Test Noisy Opensmile 2 Acc = 0.46397694524495675

No description has been provided for this image

No description has been provided for this image

No description has been provided for this image

7. Spafe (Advanced Features)¶

Gamma Frequency Cepstral Coeffcients (GFCC)¶

In [208]:
import spafe.features.gfcc

sr = 44100
def extract_feature_gfcc(audio_file, n_gfcc=13):

  '''
  Extract GFCC Features
  '''

  audio,fs = torchaudio.load(audio_file)
  audio = audio.numpy().reshape(-1)

  gfccs = spafe.features.gfcc.gfcc(sig=audio, fs=sr, num_ceps=n_gfcc)
  feat_out = np.mean(gfccs,axis=0)
    
  return feat_out
In [209]:
test_wav = f'{project_dir}/project_data/train_clean/DCB_se1_ag1_f_01_1_seg_0.wav'
print(extract_feature_gfcc(test_wav,13))
[ 3.52699334e-02 -5.40729009e-05 -6.92793861e-03 -6.57449170e-03
 -4.62176106e-03  1.73811028e-03  1.10878099e-03  3.62485582e-03
 -9.84202129e-04 -1.08928857e-03 -3.15170819e-04 -8.84683817e-04
  4.37231553e-04]
In [191]:
# Feature Extraction
accel = True
n_gfcc = 13
args = (n_gfcc,)

trainGFCC = f'{feature_dir}/gfcc{n_gfcc}_train.csv'
testCleanGFCC = f'{feature_dir}/gfcc{n_gfcc}_test_clean.csv'
testNoisyGFCC = f'{feature_dir}/gfcc{n_gfcc}_test_noisy.csv'
if(not(os.path.exists(trainGFCC))):
    train_feat = extract_feature_dir(train_files,extract_feature_gfcc,args,accel)
train_label = get_label_dir(train_files)
if(not(os.path.exists(testCleanGFCC))):
    test_clean_feat = extract_feature_dir(test_clean_files,extract_feature_gfcc,args,accel)
test_clean_label = get_label_dir(test_clean_files)
if(not(os.path.exists(testNoisyGFCC))):
    test_noisy_feat = extract_feature_dir(test_noisy_files,extract_feature_gfcc,args,accel)
test_noisy_label = get_label_dir(test_noisy_files)
100%|█████████████████████████████████████████████████████████████████████████| 4372/4372 [00:00<00:00, 2186679.83it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 447/447 [00:00<?, ?it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 347/347 [00:00<?, ?it/s]
In [193]:
# GFCC Features
feat_names = ['gfcc_' + str(n) for n in range(n_gfcc)]

# Prepare Training Set
if(not(os.path.exists(trainGFCC))):
    train_feat_df = pd.DataFrame(data=np.stack(train_feat), columns=feat_names)
    train_feat_df.to_csv(trainGFCC)
else: 
    train_feat_df = pd.read_csv(trainGFCC,index_col=0)
y_train = np.stack(train_label)

# Prepare Test Clean Set
if(not(os.path.exists(testCleanGFCC))):
    test_clean_feat_df = pd.DataFrame(data=np.stack(test_clean_feat), columns=feat_names)
    test_clean_feat_df.to_csv(testCleanGFCC,)
else:
    test_clean_feat_df = pd.read_csv(testCleanGFCC,index_col=0)
y_test_clean = np.stack(test_clean_label)

# Prepare Test Noisy Set
if(not(os.path.exists(testNoisyGFCC))):
    test_noisy_feat_df = pd.DataFrame(data=np.stack(test_noisy_feat), columns=feat_names)
    test_noisy_feat_df.to_csv(testNoisyGFCC)
else:
    test_noisy_feat_df = pd.read_csv(testNoisyGFCC,index_col=0)
y_test_noisy = np.stack(test_noisy_label)


model = xgboost.XGBClassifier()
model.fit(train_feat_df,y_train)

print("Train GFCC Clean Acc =", np.sum(y_train==model.predict(train_feat_df))/len(y_train))

print("Test GFCC Clean Acc =", np.sum(y_test_clean==model.predict(test_clean_feat_df))/len(y_test_clean))

print("Test GFCC Noisy Acc =", np.sum(y_test_noisy==model.predict(test_noisy_feat_df))/len(y_test_noisy))

print()
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(train_feat_df)
shap.summary_plot(shap_values, train_feat_df)

print()
confusion_matrix_clean = metrics.confusion_matrix(y_test_clean, model.predict(test_clean_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_clean, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()

print()
confusion_matrix_noisy= metrics.confusion_matrix(y_test_noisy, model.predict(test_noisy_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_noisy_os, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
Train GFCC Clean Acc = 1.0
Test GFCC Clean Acc = 0.6644295302013423
Test GFCC Noisy Acc = 0.8270893371757925

No description has been provided for this image

No description has been provided for this image

No description has been provided for this image

Perceptual Linear Prediction Coeffcients (PLP)¶

In [194]:
import spafe.features.rplp 

sr = 44100
def extract_feature_plp(audio_file, order=13):

  '''
  Extract PLP Features
  '''

  audio,fs = torchaudio.load(audio_file)
  audio = audio.numpy().reshape(-1)

  gfccs = spafe.features.rplp.plp(sig=audio, fs=sr, order=order)
  feat_out = np.mean(gfccs,axis=0)
    
  return feat_out
In [210]:
test_wav = f'{project_dir}/project_data/train_clean/DCB_se1_ag1_f_01_1_seg_0.wav'
print(extract_feature_plp(test_wav,13))
[-8.89161729e+01 -1.00325089e+00  1.09327901e-03  1.12161967e-03
  1.12665492e-03  1.11661652e-03  1.09560193e-03  1.06600807e-03
  1.02937231e-03  9.86705184e-04  9.38589743e-04  8.85097211e-04
  3.38410298e-03]
In [203]:
# Feature Extraction
accel = True
n_plp = 24
args = (n_plp,)

trainPLP = f'{feature_dir}/plp{n_plp}_train.csv'
testCleanPLP = f'{feature_dir}/plp{n_plp}_test_clean.csv'
testNoisyPLP = f'{feature_dir}/plp{n_plp}_test_noisy.csv'
if(not(os.path.exists(trainPLP))):
    train_feat = extract_feature_dir(train_files,extract_feature_plp,args,accel)
train_label = get_label_dir(train_files)
if(not(os.path.exists(testCleanPLP))):
    test_clean_feat = extract_feature_dir(test_clean_files,extract_feature_plp,args,accel)
test_clean_label = get_label_dir(test_clean_files)
if(not(os.path.exists(testNoisyPLP))):
    test_noisy_feat = extract_feature_dir(test_noisy_files,extract_feature_plp,args,accel)
test_noisy_label = get_label_dir(test_noisy_files)
100%|██████████████████████████████████████████████████████████████████████████████████| 24/24 [09:59<00:00, 24.99s/it]
100%|█████████████████████████████████████████████████████████████████████████| 4372/4372 [00:00<00:00, 2185376.84it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 19/19 [01:13<00:00,  3.89s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████| 447/447 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [01:15<00:00,  5.06s/it]
100%|████████████████████████████████████████████████████████████████████████████| 347/347 [00:00<00:00, 347025.15it/s]
In [204]:
# PLP Features
feat_names = ['plp_' + str(n) for n in range(n_plp)]

# Prepare Training Set
if(not(os.path.exists(trainPLP))):
    train_feat_df = pd.DataFrame(data=np.stack(train_feat), columns=feat_names)
    train_feat_df.to_csv(trainPLP)
else: 
    train_feat_df = pd.read_csv(trainPLP,index_col=0)
y_train = np.stack(train_label)

# Prepare Test Clean Set
if(not(os.path.exists(testCleanPLP))):
    test_clean_feat_df = pd.DataFrame(data=np.stack(test_clean_feat), columns=feat_names)
    test_clean_feat_df.to_csv(testCleanPLP,)
else:
    test_clean_feat_df = pd.read_csv(testCleanPLP,index_col=0)
y_test_clean = np.stack(test_clean_label)

# Prepare Test Noisy Set
if(not(os.path.exists(testNoisyPLP))):
    test_noisy_feat_df = pd.DataFrame(data=np.stack(test_noisy_feat), columns=feat_names)
    test_noisy_feat_df.to_csv(testNoisyPLP)
else:
    test_noisy_feat_df = pd.read_csv(testNoisyPLP,index_col=0)
y_test_noisy = np.stack(test_noisy_label)


model = xgboost.XGBClassifier()
model.fit(train_feat_df,y_train)

print("Train PLP Clean Acc =", np.sum(y_train==model.predict(train_feat_df))/len(y_train))

print("Test PLP Clean Acc =", np.sum(y_test_clean==model.predict(test_clean_feat_df))/len(y_test_clean))

print("Test PLP Noisy Acc =", np.sum(y_test_noisy==model.predict(test_noisy_feat_df))/len(y_test_noisy))

print()
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(train_feat_df)
shap.summary_plot(shap_values, train_feat_df)

print()
confusion_matrix_clean = metrics.confusion_matrix(y_test_clean, model.predict(test_clean_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_clean, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()

print()
confusion_matrix_noisy= metrics.confusion_matrix(y_test_noisy, model.predict(test_noisy_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_noisy_os, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
Train PLP Clean Acc = 1.0
Test PLP Clean Acc = 0.7248322147651006
Test PLP Noisy Acc = 0.6195965417867435

No description has been provided for this image

No description has been provided for this image

No description has been provided for this image

Power Normalized Cepstral Coeffcients (PNCC)¶

In [229]:
import spafe.features.pncc

sr = 44100
def extract_feature_pncc(audio_file, n_pncc=13):

  '''
  Extract PNCC Features
  '''

  audio,fs = torchaudio.load(audio_file)
  audio = audio.numpy().reshape(-1)

  pnccs = spafe.features.pncc.pncc(sig=audio, fs=sr, num_ceps=n_pncc)
  feat_out = np.mean(pnccs,axis=0)
    
  return feat_out
In [230]:
test_wav = f'{project_dir}/project_data/train_clean/DCB_se1_ag1_f_01_1_seg_0.wav'
print(extract_feature_pncc(test_wav,13))
[ 3.04516268e+00 -3.94841739e-02 -1.46968677e-01 -8.90194553e-02
 -9.12554395e-02 -1.67210619e-04 -4.02488665e-03  4.84571815e-02
 -1.51813209e-02 -1.52985058e-02 -1.63963675e-03 -1.83922195e-02
  4.15217359e-05]
In [233]:
# Feature Extraction
accel = True
n_pncc = 21
args = (n_pncc,)

trainPNCC = f'{feature_dir}/pncc{n_pncc}_train.csv'
testCleanPNCC = f'{feature_dir}/pncc{n_pncc}_test_clean.csv'
testNoisyPNCC = f'{feature_dir}/pncc{n_pncc}_test_noisy.csv'
if(not(os.path.exists(trainPNCC))):
    train_feat = extract_feature_dir(train_files,extract_feature_pncc,args,accel)
train_label = get_label_dir(train_files)
if(not(os.path.exists(testCleanPNCC))):
    test_clean_feat = extract_feature_dir(test_clean_files,extract_feature_pncc,args,accel)
test_clean_label = get_label_dir(test_clean_files)
if(not(os.path.exists(testNoisyPNCC))):
    test_noisy_feat = extract_feature_dir(test_noisy_files,extract_feature_pncc,args,accel)
test_noisy_label = get_label_dir(test_noisy_files)
100%|██████████████████████████████████████████████████████████████████████████████████| 24/24 [06:48<00:00, 17.03s/it]
100%|█████████████████████████████████████████████████████████████████████████| 4372/4372 [00:00<00:00, 2180698.90it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 19/19 [00:49<00:00,  2.59s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████| 447/447 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [00:50<00:00,  3.39s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████| 347/347 [00:00<?, ?it/s]
In [234]:
# PNCC Features
feat_names = ['pncc_' + str(n) for n in range(n_pncc)]

# Prepare Training Set
if(not(os.path.exists(trainPNCC))):
    train_feat_df = pd.DataFrame(data=np.stack(train_feat), columns=feat_names)
    train_feat_df.to_csv(trainPNCC)
else: 
    train_feat_df = pd.read_csv(trainPNCC,index_col=0)
y_train = np.stack(train_label)

# Prepare Test Clean Set
if(not(os.path.exists(testCleanPNCC))):
    test_clean_feat_df = pd.DataFrame(data=np.stack(test_clean_feat), columns=feat_names)
    test_clean_feat_df.to_csv(testCleanPNCC,)
else:
    test_clean_feat_df = pd.read_csv(testCleanPNCC,index_col=0)
y_test_clean = np.stack(test_clean_label)

# Prepare Test Noisy Set
if(not(os.path.exists(testNoisyPNCC))):
    test_noisy_feat_df = pd.DataFrame(data=np.stack(test_noisy_feat), columns=feat_names)
    test_noisy_feat_df.to_csv(testNoisyPNCC)
else:
    test_noisy_feat_df = pd.read_csv(testNoisyPNCC,index_col=0)
y_test_noisy = np.stack(test_noisy_label)


model = xgboost.XGBClassifier()
model.fit(train_feat_df,y_train)

print("Train PNCC Clean Acc =", np.sum(y_train==model.predict(train_feat_df))/len(y_train))

print("Test PNCC Clean Acc =", np.sum(y_test_clean==model.predict(test_clean_feat_df))/len(y_test_clean))

print("Test PNCC Noisy Acc =", np.sum(y_test_noisy==model.predict(test_noisy_feat_df))/len(y_test_noisy))

print()
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(train_feat_df)
shap.summary_plot(shap_values, train_feat_df)

print()
confusion_matrix_clean = metrics.confusion_matrix(y_test_clean, model.predict(test_clean_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_clean, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()

print()
confusion_matrix_noisy= metrics.confusion_matrix(y_test_noisy, model.predict(test_noisy_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_noisy_os, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
Train PNCC Clean Acc = 1.0
Test PNCC Clean Acc = 0.7360178970917226
Test PNCC Noisy Acc = 0.6311239193083573

No description has been provided for this image

No description has been provided for this image

No description has been provided for this image

8. Feature Ensembling¶

In [388]:
# Made it so one can ensemble more than two feature sets, add what features you want (prefix of CSV to list to ensemble)
features = ["gfcc15","mfcc13_delta2","os2"]
train_dfs = []
test_clean_dfs = []
test_noisy_dfs = []
for feature in features:
    train_dfs.append(pd.read_csv(f'{feature_dir}/{feature}_train.csv', index_col=0))
    test_clean_dfs.append(pd.read_csv(f'{feature_dir}/{feature}_test_clean.csv', index_col=0))
    test_noisy_dfs.append(pd.read_csv(f'{feature_dir}/{feature}_test_noisy.csv', index_col=0))

train_feat_df = pd.concat(train_dfs, axis=1)
test_clean_feat_df = pd.concat(test_clean_dfs, axis=1)
test_noisy_feat_df = pd.concat(test_noisy_dfs, axis=1)

# Get Feature Names for Columns
feat_names = train_feat_df.columns

y_train = np.stack(train_label)
y_test_clean = np.stack(test_clean_label)
y_test_noisy = np.stack(test_noisy_label)

model = xgboost.XGBClassifier()
model.fit(train_feat_df,y_train)

print("Train Fusion Clean Acc =", np.sum(y_train==model.predict(train_feat_df))/len(y_train))

print("Test Fusion Clean Acc =", np.sum(y_test_clean==model.predict(test_clean_feat_df))/len(y_test_clean))

print("Test Fusion Noisy Acc =", np.sum(y_test_noisy==model.predict(test_noisy_feat_df))/len(y_test_noisy))

print()
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(train_feat_df)
shap.summary_plot(shap_values, train_feat_df)

print()
confusion_matrix_clean = metrics.confusion_matrix(y_test_clean, model.predict(test_clean_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_clean, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()

print()
confusion_matrix_noisy= metrics.confusion_matrix(y_test_noisy, model.predict(test_noisy_feat_df))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_noisy_os, display_labels = ['ROC','LES','DCB','PRV','VLD'])
cm_display.plot()
plt.show()
Train Fusion Clean Acc = 1.0
Test Fusion Clean Acc = 0.8120805369127517
Test Fusion Noisy Acc = 0.7838616714697406

No description has been provided for this image

No description has been provided for this image

No description has been provided for this image